Learning object identification rules for information integration
نویسندگان
چکیده
When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects, as well as domaindependent transformations for recognizing format inconsistencies. This manual process is time consuming and error-prone. In our approach, Active Atlas learns to simultaneously tailor both mapping rules and a set of general transformations to a specific application domain, through limited user input. The experimental results demonstrate that we achieve higher accuracy and require less user involvement than previous methods across various application domains.
منابع مشابه
Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest
This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...
متن کاملImprovement of Navigation Accuracy using Tightly Coupled Kalman Filter
In this paper, a mechanism is designed for integration of inertial navigation system information (INS) and global positioning system information (GPS). In this type of system a series of mathematical and filtering algorithms with Tightly Coupled techniques with several objectives such as application of integrated navigation algorithms, precise calculation of flying object position, speed and at...
متن کاملRadio Frequency Identification (RFID): A Technology for Enhancing Computerized Maintenance System (CMMS)
Abstract While Computerized Maintenance Management System (CMMS) enables maintenance managers and supervisors to access information about equipment, manpower and maintenance policies, there is still a need to facilitate getting data/information into the backend database where it can be utilized by the organization as information to make decisions regarding the operation of the organization. Si...
متن کاملData Integration by means of Object Identification in Information Systems
Data integration is an important topic in the information age. Although structural aspects are widely investigated, there is a lack of research on semantic discrepancies between data sources. Data integration should be able to handle input errors such as erroneous data and misspellings. Also problems like domain and data type mismatch, of missing values and duplicated records need investigation...
متن کاملSemantic Data Integration across Different Scales: Automatic Learning of Generalization Rules
In this paper we present an approach realizing the integration of data sets of different origin and with different resolution levels. The underlying idea is to reveal semantic correspondences between object classes of different geo-ontologies only by analysis of spatial and geometrical characteristics of instances of the data sets. As a result we derive transformation rules with Data Mining met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Syst.
دوره 26 شماره
صفحات -
تاریخ انتشار 2001